AITopics | high quality data

Collaborating Authors

high quality data

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Assessing the Role of Data Quality in Training Bilingual Language Models

Seto, Skyler, ter Hoeve, Maartje, de Seyssel, Maureen, Grangier, David

arXiv.org Artificial IntelligenceJun-17-2025

Bilingual and multilingual language models offer a promising path toward scaling NLP systems across diverse languages and users. However, their performance often varies wildly between languages as prior works show that adding more languages can degrade performance for some languages (such as English), while improving others (typically more data constrained languages). In this work, we investigate causes of these inconsistencies by comparing bilingual and monolingual language models. Our analysis reveals that unequal data quality, not just data quantity, is a major driver of performance degradation in bilingual settings. We propose a simple yet effective data filtering strategy to select higher-quality bilingual training data with only high quality English data. Applied to French, German, and Chinese, our approach improves monolingual performance by 2-4% and reduces bilingual model performance gaps to 1%. These results highlight the overlooked importance of data quality in multilingual pretraining and offer a practical recipe for balancing performance.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2506.12966

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Virginia (0.04)
Europe (0.04)

Genre: Research Report > New Finding (0.68)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Minimum Tuning to Unlock Long Output from LLMs with High Quality Data as the Key

Chen, Yingda, Wang, Xingjun, Huang, Jintao, Mao, Yunlin, Zhang, Daoze, Zhao, Yuze

arXiv.org Artificial IntelligenceOct-15-2024

As large language models rapidly evolve to support longer context, there is a notable disparity in their capability to generate output at greater lengths. Recent study suggests that the primary cause for this imbalance may arise from the lack of data with long-output during alignment training. In light of this observation, attempts are made to re-align foundation models with data that fills the gap, which result in models capable of generating lengthy output when instructed. In this paper, we explore the impact of data-quality in tuning a model for long output, and the possibility of doing so from the starting points of human-aligned (instruct or chat) models. With careful data curation, we show that it possible to achieve similar performance improvement in our tuned models, with only a small fraction of training data instances and compute. In addition, we assess the generalizability of such approaches by applying our tuning-recipes to several models. our findings suggest that, while capacities for generating long output vary across different models out-of-the-box, our approach to tune them with high-quality data using lite compute, consistently yields notable improvement across all models we experimented on. We have made public our curated dataset for tuning long-writing capability, the implementations of model tuning and evaluation, as well as the fine-tuned models, all of which can be openly-accessed.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2410.1021

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Getting Deep Learning working in the wild: A Data-Centric Course - KDnuggets

#artificialintelligenceApr-27-2022, 22:20:34 GMT

Have you been excited by recent high profile deep learning successes, but not sure how to practically keep deep learning models working for your project? We've developed a distilled set of materials on data-centric deep learning approaches – which are often among the most impactful tools to get deep learning models working on new tasks. Data-centric deep learning is a relatively new area and a broad term. For us, being data-centric means taking a different perspective on deep learning that's centered around building and maintaining the datasets which define and evaluate deep learning models. The real-world applications and successes of deep learning systems are growing by the day.

data-centric course, deep learning, learning, (11 more...)

#artificialintelligence

Genre: Instructional Material > Course Syllabus & Notes (0.31)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Smart Water: Data Labeling with Active Learning And H2O.ai

#artificialintelligenceMay-16-2021, 04:05:08 GMT

Data is the food for AI. For machine Learning, or supervised learning, the golden labels are key for the models to recognize the pattern within the data. However, in the real-world data, it is usually hard to get large amount of labeled data, for example, search revelance, news topics, autopilot, etc. Recently, Angrew Ng gave a talk on MLOps: From Model-centric to Data-centric AI, where he mentioned the Idea from Big Data to Good Data. Good data is defined consistently and cover of important cases.

platform, prediction, unlabeled data, (10 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Navigate data management challenges to enable AI initiatives

#artificialintelligenceDec-10-2020, 13:50:36 GMT

Navigate data management challenges to enable AI initiatives Smart data management is the foundation of organisation-wide usage of Artificial Intelligence Leading organisations are able to fully leverage the power of Artificial Intelligence and generate value by enabling data professionals to have access to well-organised high quality data from across the entire organisation. But how can this be achieved? Save for later The Deloitte AI Loop (DAIL) The Deloitte AI Loop provides a framework that mimics the human approach in the space of artificial intelligence. Based on our experience in bringing cognitive solutions to our clients, we have lined out DAIL as a blueprint for all aspects that should be covered in a successful AI solution, as we explained in the introductory blog . This is the second article of the DAIL series, focusing on the SENSE component, consisting of tools, technology and infrastructure to measure, capture and monitor data from business processes, behavior and the environment.

enable ai initiative, high quality data, navigate data management challenge, (10 more...)

#artificialintelligence

Industry: Information Technology > Security & Privacy (0.32)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.32)

Add feedback

AI Development with Bottos: A Simple Use Case – Bottos – Medium

#artificialintelligenceNov-16-2018, 23:45:34 GMT

Bottos will soon offer great opportunities to support the development of Artificial Intelligence, with the most important step being the data and model marketplace. Thanks to the underlying blockchain infrastructure and other tools like smart contracts, users will be able to monetize their efforts to produce, clean, and ultimately, sell their data safely and conveniently. Bottos will be a great companion to everyone involved in the development of AI models and programs. Karen is a computer scientist that cares greatly for her grandmother, who is increasingly fragile and in need of assistance. While driving, Karen comes up with an interesting idea about an image and speech recognition system that, with the right development, may help seniors live longer in their own houses, autonomously, without moving to a retirement house and limiting the employment of costly nursing services.

artificial intelligence, social media, speech recognition, (15 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.55)
Information Technology > Communications > Social Media (0.49)

Add feedback

Healthcare's Best Shot At Doing AI Right: Make It Invisible

#artificialintelligenceJul-3-2018, 04:16:34 GMT

And it won't replace your radiologist. That stated, I agree with Curtis Langlotz, MD, PhD of Stanford, who stated at RSNA this year that radiologists who use AI will replace radiologists who don't. So, what is the path toward making AI a key enabler for medicine? AI-powered healthcare requires three key factors: sound data science, sharp focus and strategic deployment. And, it requires the patience to balance the excitement of advanced digital technology with the practical realities of how healthcare operates.

artificial intelligence, healthcare, radiologist, (12 more...)

#artificialintelligence

Country: North America > United States > California > San Francisco County > San Francisco (0.05)

Industry:

Health & Medicine > Diagnostic Medicine > Imaging (0.98)
Health & Medicine > Nuclear Medicine (0.82)
Health & Medicine > Surgery (0.76)
Health & Medicine > Health Care Providers & Services (0.74)

Technology: Information Technology > Artificial Intelligence (1.00)

Add feedback

Barrier to AI in the Enterprise: Access to High Quality Data - AI Trends

#artificialintelligenceApr-8-2018, 19:37:47 GMT

According to a recent Teradata study, 80% of IT and business decision-makers have already implemented some form of artificial intelligence (AI) in their business. The study also found that companies have a desire to increase AI spending. Forty-two percent of respondents to the Teradata study said they thought there was more room for AI implementation across the business, and 30% said their organizations weren't investing enough in AI. Forrester recently released their 2018 Predictions and also found that firms have an interest investing in AI. Fifty-one percent of their 2017 respondents said their firms were investing in AI, up from 40% in 2016, and 70% of respondents said their firms will have implemented AI within the next 12 months. While the interest to invest in and grow AI implementation is there, 91% of respondents to the Teradata survey said they expect to see barriers get in the way of investing in and implementing AI.

ai implementation, high quality data, respondent, (4 more...)

#artificialintelligence

Genre: Questionnaire & Opinion Survey (1.00)

Technology: Information Technology > Artificial Intelligence (1.00)

Add feedback

The NHS is a much bigger challenge for DeepMind than Go

#artificialintelligenceAug-24-2017, 10:30:17 GMT

People have a weird obsession with games likes Chess and Go. Achievement in them has long been seen as a marker of human intellect, and yet they're among the least human test you could devise; putting players in simplified situations where everything is known, every possible course of action is laid out for them, and the test is one of concentration and logic. We pass far greater tests daily, when we recognise a face in a crowd, when we dynamically balance in motion, when we predict the response our words and expressions will have on another sentient being, or when we do all of the above, effortlessly, at the same time. We don't think of these as challenging because they're so innately human, while playing Chess or Go seems far more impressive precisely because they're more rigid and computational in nature. There's an irony in making a board game one of the'grand challenges' of AI, and it surprises me that more people don't see it.

algorithm, artificial intelligence, machine learning, (15 more...)

#artificialintelligence

Country: Europe > United Kingdom (0.42)

Industry: